100 research outputs found
Bandit Online Learning in Pseudo-Monotone Games with Multi-Point Pseudo-Gradient Estimate
Non-cooperative games serve as a powerful framework for capturing the
interactions among self-interested players and have broad applicability in
modeling a wide range of practical scenarios, ranging from power management to
drug delivery. Although most existing solution algorithms assume the
availability of first-order information or full knowledge of the objectives and
others' action profiles, there are situations where the only accessible
information at players' disposal is the realized objective function values. In
this paper, we devise a bandit online learning algorithm that integrates the
optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We
further demonstrate that the generated actual sequence of play can converge
a.s. to a critical point if the game under study is merely coherent, without
resorting to extra Tikhonov regularization terms or additional norm conditions.
Finally, we illustrate the validity of the proposed algorithm via a
Rock-Paper-Scissors game and a least square estimation game
A Bandit Learning Method for Continuous Games under Feedback Delays with Residual Pseudo-Gradient Estimate
Learning in multi-player games can model a large variety of practical
scenarios, where each player seeks to optimize its own local objective
function, which at the same time relies on the actions taken by others.
Motivated by the frequent absence of first-order information such as partial
gradients in solving local optimization problems and the prevalence of
asynchronicity and feedback delays in multi-agent systems, we introduce a
bandit learning algorithm, which integrates mirror descent, residual
pseudo-gradient estimates, and the priority-based feedback utilization
strategy, to contend with these challenges. We establish that for
pseudo-monotone plus games, the actual sequences of play generated by the
proposed algorithm converge a.s. to critical points. Compared with the existing
method, the proposed algorithm yields more consistent estimates with less
variation and allows for more aggressive choices of parameters. Finally, we
illustrate the validity of the proposed algorithm through a thermal load
management problem of building complexes
A Study of the Duality between Kalman Filters and LQR Problems
The goal of this paper is to study a connection between the finite-horizon Kalman filtering and the LQR problems for discrete-time LTI systems. Motivated from the recent duality results on the LQR problem, a Lagrangian dual relation is used to prove that the Kalman filtering problem is a Lagrange dual problem of the LQR problem
A Semidefinite Programming Formulation of the LQR Problem and Its Dual
The goal of this paper is to derive a modified formulation of the finite-horizon LQR problem, which can be cast as semidefinite programming problems (SDPs). In addition, based on the the Lagrangian duality, its dual problem is studied. We establish connections between the proposed primal-dual conditions with existing results. As an application of the proposed results, the decentralized LQR analysis and design problems are addressed. Especially, using the structure of the derived LQR formulations, a sufficient but simple and convex surrogate problem is developed for solving decentralized LQR design problems
Stabilizing Switched Linear Systems under Adversarial Switching
The problem of stabilizing discrete-time switched linear control systems using continuous input by the user and against adversarial switching by an adversary is studied. It is assumed that the adversary has the advantage in that at each time it knows the user\u27s decision on the continuous control input but not vice versa. Stabilizability conditions and bounds on the fastest stabilizing rates are derived. Examples are given to illustrate the results
- …